Supporting the Optimisation of Distributed Data Mining by Predicting Application Run Times
نویسندگان
چکیده
There is an emerging interest in optimisation strategies for distributed data mining in order to improve response time. Optimisation techniques operate by first identifying factors that affect the performance in distributed data mining, computing/assigning a “cost” to those factors for alternate scenarios or strategies and then choosing a strategy that involves the least cost. In this paper we propose the use of application run time estimation as solution to estimating the cost of performing a data mining task in different distributed locations. A priori knowledge of the response time provides a sound basis for optimisation strategies, particularly if there are accurate techniques to obtain such knowledge. In this paper we present a novel rough sets based technique for predicting the run times of applications. We also present experimental validation of the prediction accuracy of this technique for estimating the run times of data mining tasks.
منابع مشابه
On the development of a stochastic optimisation algorithm with capabilities for distributed computing
In this thesis, we devise a new stochastic optimlsation method (cascade optimisation algorithm) by incorporating the concepts from Markov process whilst eliminating the inherent sequential nature that is the major deficit preventing the exploitation of advances in distributed computing infrastructures. This method introduces partitions and pools to store intermediate solution and corresponding ...
متن کاملApplication of Machine Learning Approaches in Rainfall-Runoff Modeling (Case Study: Zayandeh_Rood Basin in Iran)
Run off resulted from rainfall is the main way of receiving water in most parts of the World. Therefore, prediction of runoff volume resulted from rainfall is getting more and more important in control, harvesting and management of surface water. In this research a number of machine learning and data mining methods including support vector machines, regression trees (CART algorithm), model tree...
متن کاملHydrograph Modeling Using SGSim: A Case Study of Behbahan Aquifer, Southwest of Iran
Hydrograph modeling and prediction of groundwater levels are the main concerns of most hydrogeological calculations and water resource management process. The present study is an application of Sequential Gaussian Simulation (SGSim) method for predicting groundwater levels using recorded monthly data (180 months) related to 21 piezometers of Behbahan aquifer, southwest of Iran. To generate real...
متن کاملThe WebSocket API as supporting technology for distributed and agent-driven data mining
Supporting technologies play an important role in distributed data mining systems. The flexibility and the scalability of infrastructures and architectures can often determine the strength of a distributed data mining framework. In this paper we present some preliminary research work on a prototype for a distributed data miming framework. We shall show how the WebSocket API, which is a draft sp...
متن کاملA case study for application of fuzzy inference and data mining in structural health monitoring
In this study, a system for monitoring the structural health of bridge deck and predicting various possible damages to this section was designed based on measuring the temperature and humidity with the use of wireless sensor networks, and then it was implemented and investigated. A scaled model of a conventional medium sized bridge (length of 50 meters, height of 10 meters, and with 2 piers) wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002